{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "6321595e",
   "metadata": {},
   "source": [
    "# Cox生存分析\n",
    "\n",
    "* `mydir`：自己的数据\n",
    "* `ostime_column`: 数据对应的生存时间，不一定非的是OST，也可以是DST、FST等。\n",
    "* `os`：生存状态，不一定非的是OS，也可以是DS、FS等。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cb0fc498",
   "metadata": {},
   "outputs": [],
   "source": [
    "from lifelines import CoxPHFitter\n",
    "import pandas as pd\n",
    "from onekey_algo import OnekeyDS as okds\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "mydir = okds.survival\n",
    "ostime_column = 'duration'\n",
    "os_column = 'result'\n",
    "data = pd.read_csv(mydir)\n",
    "\n",
    "# 这个地方需要使用自己划分好的数据集\n",
    "data = data.drop('ID', axis=1)\n",
    "train_data, test_data = train_test_split(data, test_size=0.2, random_state=0)\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9cce536d",
   "metadata": {},
   "source": [
    "## Cox概览\n",
    "\n",
    "所有Cox回归的必要数据，主要关注的数据有3个\n",
    "1. `Concordance`: c-index\n",
    "2. `exp(coef)`: 每个特征对应的HR，同时也有期对应的95%分位数。\n",
    "3. `p`: 表示特征是否显著。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b0201324",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "cph = CoxPHFitter()\n",
    "cph.fit(train_data, duration_col=ostime_column, event_col=os_column)\n",
    "\n",
    "cph.print_summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac98781f",
   "metadata": {},
   "source": [
    "#### 输出每个特征的HR"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "349aa141",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "cph.plot()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "872300b2",
   "metadata": {},
   "source": [
    "# 绘制Nomogram\n",
    "\n",
    "要指定绘制nomo图使用到的列，在columns变量。\n",
    "\n",
    "**注意**：survs表示时间分段，需要根据自己的数据情况划分，如果是时间是天数，则3年、5年生存率对应的survs参数为`survs=[3*365, 5*365]`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12674616",
   "metadata": {},
   "outputs": [],
   "source": [
    "from onekey_algo.custom.components import nomogram\n",
    "\n",
    "nomogram.nomogram(train_data, duration=ostime_column, result=os_column, columns=['age', 'gender', 'degree', 'Tstage', 'BMI', 'chemotherapy'],\n",
    "                  survs=[36, 60], surv_names=['3 year survival','5 year survival'], with_r=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6bf8910e",
   "metadata": {},
   "source": [
    "# KM 曲线\n",
    "\n",
    "根据HR进行分组，计算KM以及log ranktest"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "669023fa",
   "metadata": {},
   "outputs": [],
   "source": [
    "from lifelines import CoxPHFitter\n",
    "from lifelines.statistics import logrank_test\n",
    "from lifelines import KaplanMeierFitter\n",
    "\n",
    "c_index = cph.score(test_data[[c for c in test_data.columns if c != 'ID']], scoring_method=\"concordance_index\")\n",
    "y_pred = cph.predict_partial_hazard(test_data[[c for c in test_data.columns if c != 'ID']])\n",
    "cox_data = pd.concat([test_data, y_pred], axis=1)\n",
    "\n",
    "cox_data['HR'] = cox_data[0] > 1\n",
    "dem = (cox_data[\"HR\"] == True)\n",
    "\n",
    "results = logrank_test(cox_data[ostime_column][dem], cox_data[ostime_column][~dem], \n",
    "                       event_observed_A=cox_data[os_column][dem], event_observed_B=cox_data[os_column][~dem])\n",
    "p_value = results.p_value\n",
    "kmf = KaplanMeierFitter()\n",
    "plt.title(f\"C-index:{c_index:.4f}, p_value={p_value}\")\n",
    "# 分成2组之后绘制KM曲线\n",
    "if sum(dem):\n",
    "    kmf.fit(cox_data[ostime_column][dem], event_observed=cox_data[os_column][dem], label=\"High Rish\")\n",
    "    kmf.plot_survival_function(color='r')\n",
    "if sum(~dem):\n",
    "    kmf.fit(cox_data[ostime_column][~dem], event_observed=cox_data[os_column][~dem], label=\"Low Risk\")\n",
    "    kmf.plot_survival_function(color='g')\n",
    "\n",
    "plt.savefig(f'km_TCGA.svg', bbox_inches='tight')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b79ae1f9",
   "metadata": {},
   "source": [
    "# 根据先验分组\n",
    "\n",
    "根据先验特征进行分组，计算KM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5dcf51b4",
   "metadata": {},
   "outputs": [],
   "source": [
    "cph.plot_partial_effects_on_outcome(covariates='Tstage', values=[0, 1, 2, 3], cmap='coolwarm')\n",
    "plt.xlim(20, 83)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ea523994",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
